15 research outputs found

    DAC: Detector-Agnostic Spatial Covariances for Deep Local Features

    Full text link
    Current deep visual local feature detectors do not model the spatial uncertainty of detected features, producing suboptimal results in downstream applications. In this work, we propose two post-hoc covariance estimates that can be plugged into any pretrained deep feature detector: a simple, isotropic covariance estimate that uses the predicted score at a given pixel location, and a full covariance estimate via the local structure tensor of the learned score maps. Both methods are easy to implement and can be applied to any deep feature detector. We show that these covariances are directly related to errors in feature matching, leading to improvements in downstream tasks, including solving the perspective-n-point problem and motion-only bundle adjustment. Code is available at https://github.com/javrtg/DA

    Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval

    Full text link
    Uncertainty quantification in image retrieval is crucial for downstream decisions, yet it remains a challenging and largely unexplored problem. Current methods for estimating uncertainties are poorly calibrated, computationally expensive, or based on heuristics. We present a new method that views image embeddings as stochastic features rather than deterministic features. Our two main contributions are (1) a likelihood that matches the triplet constraint and that evaluates the probability of an anchor being closer to a positive than a negative; and (2) a prior over the feature space that justifies the conventional l2 normalization. To ensure computational efficiency, we derive a variational approximation of the posterior, called the Bayesian triplet loss, that produces state-of-the-art uncertainty estimates and matches the predictive performance of current state-of-the-art methods

    Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs

    Full text link
    Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such as floaters or flawed geometry when rendered outside the camera trajectory. Existing evaluation protocols often do not capture these effects, since they usually only assess image quality at every 8th frame of the training capture. To push forward progress in novel-view synthesis, we propose a new dataset and evaluation procedure, where two camera trajectories are recorded of the scene: one used for training, and the other for evaluation. In this more challenging in-the-wild setting, we find that existing hand-crafted regularizers do not remove floaters nor improve scene geometry. Thus, we propose a 3D diffusion-based method that leverages local 3D priors and a novel density-based score distillation sampling loss to discourage artifacts during NeRF optimization. We show that this data-driven prior removes floaters and improves scene geometry for casual captures.Comment: ICCV 2023, project page: https://ethanweber.me/nerfbuster

    K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

    Full text link
    We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static (d=3) to dynamic (d=4) scenes. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, and induces a natural decomposition of static and dynamic components of a scene. We use a linear feature decoder with a learned color basis that yields similar performance as a nonlinear black-box MLP decoder. Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity with low memory usage, achieving 1000x compression over a full 4D grid, and fast optimization with a pure PyTorch implementation. For video results and code, please see https://sarafridov.github.io/K-Planes.Comment: Project page https://sarafridov.github.io/K-Planes

    Probabilistic Spatial Transformers for Bayesian Data Augmentation

    Full text link
    High-capacity models require vast amounts of data, and data augmentation is a common remedy when this resource is limited. Standard augmentation techniques apply small hand-tuned transformations to existing data, which is a brittle process that realistically only allows for simple transformations. We propose a Bayesian interpretation of data augmentation where the transformations are modelled as latent variables to be marginalized, and show how these can be inferred variationally in an end-to-end fashion. This allows for significantly more complex transformations than manual tuning, and the marginalization implies a form of test-time data augmentation. The resulting model can be interpreted as a probabilistic extension of spatial transformer networks. Experimentally, we demonstrate improvements in accuracy and uncertainty quantification in image and time series classification tasks.Comment: Submitted to the International Conference on Machine Learning (ICML), 202

    Learning to Taste: A Multimodal Wine Dataset

    Full text link
    We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique vintages, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.Comment: Accepted to NeurIPS 2023. See project page: https://thoranna.github.io/learning_to_taste

    Expression of Transketolase like gene 1 (TKTL1) predicts disease-free survival in patients with locally advanced rectal cancer receiving neoadjuvant chemoradiotherapy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>For patients with locally advanced rectal cancer (LARC) neoadjuvant chemoradiotherapy is recommended as standard therapy. So far, no predictive or prognostic molecular factors for patients undergoing multimodal treatment are established. Increased angiogenesis and altered tumour metabolism as adaption to hypoxic conditions in cancers play an important role in tumour progression and metastasis. Enhanced expression of Vascular-endothelial-growth-factor-receptor <it>(VEGF-R</it>) and Transketolase-like-1 (<it>TKTL1</it>) are related to hypoxic conditions in tumours. In search for potential prognostic molecular markers we investigated the expression of <it>VEGFR-1</it>, <it>VEGFR-2 </it>and <it>TKTL1 </it>in patients with LARC treated with neoadjuvant chemoradiotherapy and cetuximab.</p> <p>Methods</p> <p>Tumour and corresponding normal tissue from pre-therapeutic biopsies of 33 patients (m: 23, f: 10; median age: 61 years) with LARC treated in phase-I and II trials with neoadjuvant chemoradiotherapy (cetuximab, irinotecan, capecitabine in combination with radiotherapy) were analysed by quantitative PCR.</p> <p>Results</p> <p>Significantly higher expression of <it>VEGFR-1/2 </it>was found in tumour tissue in pre-treatment biopsies as well as in resected specimen after neoadjuvant chemoradiotherapy compared to corresponding normal tissue. High <it>TKTL1 </it>expression significantly correlated with disease free survival. None of the markers had influence on early response parameters such as tumour regression grading. There was no correlation of gene expression between the investigated markers.</p> <p>Conclusion</p> <p>High <it>TKTL-1 </it>expression correlates with poor prognosis in terms of 3 year disease-free survival in patients with LARC treated with intensified neoadjuvant chemoradiotherapy and may therefore serve as a molecular prognostic marker which should be further evaluated in randomised clinical trials.</p

    SparseFormer: Attention-based Depth Completion Network

    Full text link
    Most pipelines for Augmented and Virtual Reality estimate the ego-motion of the camera by creating a map of sparse 3D landmarks. In this paper, we tackle the problem of depth completion, that is, densifying this sparse 3D map using RGB images as guidance. This remains a challenging problem due to the low density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM pipelines. We introduce a transformer block, SparseFormer, that fuses 3D landmarks with deep visual features to produce dense depth. The SparseFormer has a global receptive field, making the module especially effective for depth completion with low-density and non-uniform landmarks. To address the issue of depth outliers among the 3D landmarks, we introduce a trainable refinement module that filters outliers through attention between the sparse landmarks.Comment: Accepted at CV4ARVR 202

    Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization

    No full text
    Place recognition and visual localization are particularly challenging in wide baseline configurations. In this paper, we contribute with the \emph{Danish Airs and Grounds} (DAG) dataset, a large collection of street-level and aerial images targeting such cases. Its main challenge lies in the extreme viewing-angle difference between query and reference images with consequent changes in illumination and perspective. The dataset is larger and more diverse than current publicly available data, including more than 50 km of road in urban, suburban and rural areas. All images are associated with accurate 6-DoF metadata that allows the benchmarking of visual localization methods. We also propose a map-to-image re-localization pipeline, that first estimates a dense 3D reconstruction from the aerial images and then matches query street-level images to street-level renderings of the 3D model. The dataset can be downloaded at: https://frederikwarburg.github.io/DAGComment: Submitted to RA-L (IROS
    corecore